The purpose of this document is to detail the replication materials for "Twitter as Data".

########################################################
########################################################
#
# 	1. FILES FOR PROCESSING DATA
#
########################################################
########################################################

1. Script_SimulationOfDegreeCentralityMeasures_Final.py
	- This script simulates networks and saves out the network data.
	- Input: None
	- Output: Files in ReplicationMaterial/Data/SimulationData/

2. Script_CombineSifterData.R
	- This script combines the tweets from Egypt and Bahrain into one dataframe.
	- Input:
		1. Data/Tweets_Dataframe_FromSifter_OnlyBahrain.csv
			- Not provided; please contact me for them.
		2. Data/Tweets_Dataframe_FromSifter_OnlyEgypt.csv
	- Output:
		1. Data/Tweets_Dataframe_FromSifter_Merged.csv


3. Script_MakeLongitudinalFollowersDataframe.R
	- This script makes the dataframe to track change in followers, NCC by day.
	- Input:
		1. A series of .gml files, one per country-day.
	- Output:
		1. Data/LongitudinalFollowersDataframe.csv
		2. Data/LongitudinalNCCDataframe.csv
	- NB: I have not included the .gml files in my replication data because there are hundreds and completing them becomes the most complicated part of the replication process (it eventually involves waiting weeks to download from Twitter's REST API). I am more than willing to provide the .gml files and code to create them, i.e. code to work with Twitter's REST API, to inclined replicators.  Until then, I hope that providing the code I used to generate the dataframe that tracks NCC by day - the main contribution of the paper - suffices.

4. Extract_tweetIDs_fromSifter.R
	- This script extracts the tweet IDs of the raw tweets.
	- Input:
		1. Dataframe of tweets from Sifter.
			- Cannot be shared because of Twitter's Terms of Service
	- Output:
		1. Data/Sifter_tweetIDs.csv

5. DownloadTweetsBasedOnSifterID.py
	- This script takes the tweet identification numbers and downloads the corresponding tweets from Twitter's REST API.
	- For an explanation of how to use Twitter and more detail on this script, see Steinert-Threlkeld, Zachary C. 2017.  "Twitter as Data".  Cambridge University Press.
	- For replication, I have provided a cleaned version of the tweets from Egypt.  Simultaneously providing Bahrain's tweets would violate Twitter's terms of service; if you need Bahrain's tweets, e-mail me and I can provide them to you that way.
	- Input:
		1. Data/Sifter_tweetIDs.csv
	- Output:
		1. Data/tweets_downloadedTwython.txt


########################################################
########################################################
#
# 	2. PROVIDED DATASETS
#		- These are datasets not produced by a script above.  Data produced by the scripts above are also included in this paper's replication materials but are not detailed here.
########################################################
########################################################
1. Data/Tweets_Dataframe_FromSifter_OnlyEgypt.csv
	- Please contact me for the equivalent dataframe for Bahrain.

2. Data/Tweets_Dataframe_FromSifter_ForEgypt_f1Based_DayUseAggregated.csv.gz
	- Please contact me for the equivalent dataframe for Bahrain.
	- For detail on the topic models for these tweets, see Steinert-Threlkeld, Zachary C. 2017. Spontaneous Collective Action: Peripheral Mobilization During the Arab Spring.  American Political Science Review.



########################################################
########################################################
#
# 	3. FILES FOR MAIN PAPER
#
########################################################
########################################################

1. Script_Fig1.R
	- This script makes Figure 1.
	- Input: None
	- Output: Fig_1.pdf


2. Script_Fig2_Fig3.R
	- This script makes Figures 2 and 3.
	- Input:
		1. The output from Script_SimulationOfDegreeCentralityMeasures_Final.py.
			- The output of that files is a series of dataframes of simulated data.
				- Those data are in Data/SimulationData/ or could be recreated from the Python script.
	- Output:
		1. Figures 2a, 2b, and 2c.
			- NB: Other figures are produced that are not used in the paper.
		2. Figures 3a, 3b, and 3c.
			- NB: Other figures are produced that are not used in the paper.

3. Script_Fig4a.R
	- This script makes Figure 4a.
	- Input:
		1. Data/Tweets_Dataframe_FromSifter_Merged.csv
		2. Data/LongitudinalFollowersDataframe.csv
	- Output:
		1. Fig_4a.pdf

3. Script_Fig4b.R
	- This script makes Figure 4b.
	- Input:
		1. Data/Tweets_Dataframe_FromSifter_Merged.csv
		2. Data/LongitudinalFollowersDataframe.csv
	- Output:
		1. Fig_4b.pdf

4. Script_Fig5.R
	- This script makes Figures 5a and 5b.
	- Input:
		1.  Data/LongitudinalNCCDataframe.csv
	- Output:
		1.  Figure 5a
			- Name: Followers2_Sum_ChangeFor_Bahrain_JanApr2011_12ptfont.pdf
		2.  Figure 5b
			- Name: Followers2_Sum_ChangeFor_Egypt_JanApr2011_12ptfont.pdf

5. Script_Fig6.R
	- This script makes Figures 6a and 6b.
	- Input:
		1. Data/LongitudinalFollowersDataframe.csv
		2. Data/LongitudinalNCCDataframe.csv
		3. Data/SimulationData/Power_Law_10000_2.089.csv
	- Output:
		1. Fig_6a.pdf
		2. Fig_6b.pdf

6. Script_Table2.R
	- This script creates the model of rank change of accounts based on followers and NCC.
	- Input:
		1. Data/LongitudinalFollowersDataframe.csv
		2. Data/LongitudinalNCCDataframe.csv
		3. Data/Tweets_Dataframe_FromSifter_ForEgypt_f1Based_DayUseAggregated.csv.gz
		4. Data/Tweets_Dataframe_FromSifter_ForBahrain_f1Based_DayUseAggregated.csv.gz
			- Contact me for this dataframe.
	- Output:
		1. ModelResults.tex


########################################################
########################################################
#
# 	4. FILES FOR SUPPLEMENTARY MATERIALS
#
########################################################
########################################################
1. Script_Fig2_Fig3.R
	- Makes subfigures for Figure 1 and Figure 2.  Note that this is the same script used for Figures 1 and 2 in the main body; part of it also makes the first two figures of the SM.
	- Input:
		1. The output from Script_SimulationOfDegreeCentralityMeasures_Final.py.
			- The output of that files is a series of dataframes of simulated data.
				- Those data are in Data/SimulationData/ or could be recreated from the Python script.
	- Output:
		1. Figures SM 1a, 1b, 1c, and 1d.
			- Files are SampledNetwork_RankCorrelation_Closeness_###NodesKept.pdf
			- NB: Other figures are produced that are not used in the paper.
		2. Figures SM 2a, 2b, 2c, and 2d.
			- NB: Other figures are produced that are not used in the paper.
			- Files are SampledNetwork_RankCorrelation_PageRank_###NodesKept.pdf


2. Script_Fig_SM_4_5_LowerUpperBound.R
	- Makes Figures 4 and 5 of the SM.
	- Figure 3 is pseudocode created within the .tex document.
	- Input:
		1. Dataframes of accounts' followers, one per day.
			- NB: I do not provide these dataframes as part of the replication material because they are too numerous and large.  They can be provided upon request.
		2. LongitudinalFollowersDataframeFollowers2_BasedOnLowerBoundofCIUpperBound.csv
			- NB: This is the output from the first part of this script.
		3. Data/LongitudinalFollowersDataframe.csv
		4. Data/LongitudinalNCCDataframe.csv
	- Output:
		1. Fig_SM_5a
		2. Fig_SM_5b

3. Script_Fig_SM_6_8.R
	- Makes Figure 6 and Figure 8 for the SM.  It also makes Figure 8 but where the linetype varies.
	- Input:
		1. Data/LongitudinalFollowersDataframeFollowers2_BasedOnLowerBoundofCIUpperBound
		2. Data/LongitudinalNCCDataframe.csv
	- Output:
		1. SM_Fig_6a, SM_Fig_6b - Note that I keep their original long file names because they are created in a loop.  Comments in the code indicate which parts create Figure 6.
		2. SM_Fig_8a, SM_Fig_8b - Note that I keep their original long file names because they are created in a loop.  Comments in the code indicate which parts create Figure 8.

4. Script_Fig_SM_7.R
	- Makes Figure 7 in the Supplementary Materials.
	- Input:
		1. Data/Tweets_Dataframe_FromSifter_Merged.csv
		2. Data/LongitudinalFollowersDataframe.csv
	- Output:
		1. Fig_SM_7.pdf

5. Script_Fig_SM_Fig9_Fig10.pdf
	- Makes network graphs of Bahrain, Egypt
	- Input:
		1. Data/Egypt_Before_20110124toR_Dec2015.gml
		2. Data/Egypt_Before_20110404toR_Dec2015.gml
		3. Data/Bahrain_Before_20110124toR_Dec2015.gml
		4. Data/Bahrain_Before_20110404toR_Dec2015.gml
	- Output
		1. Figure_SM_9.pdf
			- Name from script: International_IsolateOverlapCommunities_NetworkMap_Communities_Before20110124EdgeSame25EdgeDiff25_v3.pdf
		2. Figure_SM_10.pdf
			- Name from script: International_IsolateOverlapCommunities_NetworkMap_Communities_Before20110404EdgeSame25EdgeDiff25_v3.pdf
		3. Followers_DualCountry_20110124.csv
		4. Followers_DualCountry_20110404.csv










